Rapid unit selection from a large speech corpus for concatenative speech synthesis

نویسندگان

Marc C. Beutnagel

Mehryar Mohri

Michael Riley

چکیده

Concatenative Text-to-Speech (TTS) systems such as those described by Hunt and Black [6] can select at synthesis time from a very large number of recorded units. The selected units are chosen to minimize a combination of target and join costs for a given sentence. However, the join costs, in particular, can be quite expensive to compute, even when this computation has been optimized. If possible, we would avoid this computation by precomputing and caching all the possible join costs, but their number is prohibitive. Although the search space of possible joins is large, we have found that only a small fraction are selected in practice. By synthesizing a large quantity of text and logging the units actually selected, we were able to gather usage statistics and construct a practical and efficient cache of concatenation costs. Use of this cache dramatically decreased the runtime of the AT&T Next-Generation TTS system [1] with negligible effect on speech quality. Experiments show that by caching 0.7% of the possible joins, 99% of the join cost computations can be avoided.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Vocalic sandwich, a unit designed for unit selection TTS

Unit selection text-to-speech systems currently produce very natural synthetic sentences by concatenating speech segments from a large database. Recently, increasing demand for designing high quality voices with less data creates need for further optimization of the textual corpus recorded by the speaker. The optimization process of this corpus is traditionally guided by the coverage rate of we...

متن کامل

Forward Masking Phenomenon in Concatenative Speech Synthesis

The approach described in the paper tries to get more knowledge to the concatenative text-to-speech system design. The knowledge is based on masking phenomenon of the inner ear, particularly of its temporal (forward) masking properties. Designing such knowledge-based system is suggested to use in the unit selection-based speech synthesis, as contemporary a prominent technique in concatenative s...

متن کامل

High-Individuality Voice Conversion Based on Concatenative Speech Synthesis

Concatenative speech synthesis is a method that can make speech sound which has naturalness and high-individuality of a speaker by introducing a large speech corpus. Based on this method, in this paper, we propose a voice conversion method whose conversion speech has high-individuality and naturalness. The authors also have two subjective evaluation experiments for evaluating individuality and ...

متن کامل

A Corpus-Based Concatenative Speech Synthesis System for Turkish

Speech synthesis is the process of converting written text into machine-generated synthetic speech. Concatenative speech synthesis systems form utterances by concatenating pre-recorded speech units. Corpus-based methods use a large inventory to select the units to be concatenated. In this paper, we design and develop an intelligible and natural sounding corpus-based concatenative speech synthes...

متن کامل

Prosody-based unit selection for Japanese speech synthesis

A corpus-based concatenative speech synthesis system using no signal processing can produce intelligible synthetic speech maintaining original voice characteristics. In such a concatenative system, it is very important to select appropriate waveform segments that are naturally close to the target prosody. But with a limited size database it can sometimes be di cult to realize natural prosody. T...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1999

Rapid unit selection from a large speech corpus for concatenative speech synthesis

نویسندگان

چکیده

منابع مشابه

Vocalic sandwich, a unit designed for unit selection TTS

Forward Masking Phenomenon in Concatenative Speech Synthesis

High-Individuality Voice Conversion Based on Concatenative Speech Synthesis

A Corpus-Based Concatenative Speech Synthesis System for Turkish

Prosody-based unit selection for Japanese speech synthesis

عنوان ژورنال:

اشتراک گذاری